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Question 1 [24+24+2424242=12 marks total| 


The following represents a sorted list of values for the age attribute from a set of 
data tuples: 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 
35, 35, 36, 40, 45, 46, 52, 70. 


(a) What is the mean of the data? 

(b) What is the median of the data? 

(c) What is the mode? Comment on the modality. 

(d) What are (roughly) the 1st (Q1) and 3rd (Q3) quartiles? 
(ec) What is the Interquartile Range (IQR)? 
(f) What is the five-number summary? 


Question 2 [24+44+242=10 marks total] 


The raw data set Accounts (Student , Mode, Faculty, Logins, Time ,Downloads) is: 


Student Mode Faculty Logins Time Downloads 


albert ext Science 12 120 2.7 
bazza int Science 6 67 1.1 
cathy ext Arts 16 320 22:9 
dave ext Arts 20 250 ee 
daffy int Science 15 85 1.6 
fredo ext Arts 10 50 0.9 


where Time is connection time in minutes and Downloads are in megabytes. 


(a) List all cuboids in a data cube with dimensions Mode and Faculty. 
(b) Construct the base cuboid with aggregates SUM(Logins), and MAX(Time). 
(c) Construct the Faculty cuboid for the aggregate SUM(Logins). 

) 


(d) Construct the apex cuboid for the aggregate SUM(Logins). 
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Question 3 [2+24+24+4=10 marks total] 


(a) List 4 OLAP operations. 
(b) Give one example of a concept hierarchy. 


(c) Name a method for training a multilayer neural network. What gets updated 
during training? 


(d) Compute the Euclidean distance between the points (22, 1, 42, 10) and 
(20, 0, 36, 8). 
Question 4 [2+8+8+2=20 marks total] 


Build a Naive Bayes classification model from the following table of six records: 


a bc d_ category 
RI|N Y N N 2 
RZ Se ye oe 1 
R3}Y N N N 1 
R4|N N N N 2 
R5|}Y N N Y 2 
R6|N Y Y N 2 


(a) Calculate P(c), the probability of finding category c, for all categories. 


(b) Calculate P(i|c), the probability of finding attribute 7 given category c, for all 
attributes and categories. 


(c) Let U=(Y,N,N,Y) be an unclassified record. Calculate P(U|c), the probability 
of finding the attribute set U given category c, for all categories. 


(d) To what category does the Naive Bayes classifier assign the attribute set U? 


Question 5 [44+4=8 marks total] 


Consider the following confusion matrix for a classification model with a percent 
error of 13.8% for Class 2, and a model accuracy of 88%. 


Predicted 
Class 1 Class 2 Class 3 
Actual Class 1 | 30 0 3 
Class 2 | x 25 2 
Class 3 | 1 y 41 
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(a) Find the value of x (round to nearest integer). 


(b) Find the value of y (round to nearest integer). 


Question 6 [2+6+2=10 marks total] 


Consider the table: 


(a) Calculate means for x and y. 
(b) Construct a linear model (y = b+ wz) to fit the data. 


(c) Use the model to predict the y value of a record with x = 6. 
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